Enable manifest-scheduling on autoland
Categories
(Firefox Build System :: Task Configuration, task, P1)
Tracking
(Not tracked)
People
(Reporter: ahal, Assigned: ahal)
References
(Blocks 1 open bug, Regressed 1 open bug)
Details
Attachments
(9 files)
47 bytes,
text/x-phabricator-request
|
Details | Review | |
47 bytes,
text/x-phabricator-request
|
Details | Review | |
47 bytes,
text/x-phabricator-request
|
Details | Review | |
47 bytes,
text/x-phabricator-request
|
Details | Review | |
47 bytes,
text/x-phabricator-request
|
Details | Review | |
47 bytes,
text/x-phabricator-request
|
Details | Review | |
47 bytes,
text/x-phabricator-request
|
Details | Review | |
47 bytes,
text/x-phabricator-request
|
Details | Review | |
47 bytes,
text/x-phabricator-request
|
Details | Review |
Now that the initial implementation of 'manifest-scheduling' has landed, this bug will track turning it on for autoland.
Solving backfills will be the major blocker here, though we'll also need to ensure we don't regress Push Health in a major way.
Comment 1•5 years ago
|
||
To avoid regressions in sheriff's classifications quality, we should probably:
- enforce one backout per push (so we avoid https://github.com/mozilla/mozci/issues/204). We might want to do bug 1636440 before enforcing;
- organize a "training" session with sheriffs to explain the changes.
Assignee | ||
Updated•5 years ago
|
Assignee | ||
Comment 2•5 years ago
|
||
Sets autoland to use the 'bugbug' test manifest loader. This is being enabled
as part of a temporary trial to see the impact it has on sheriffing.
Updated•5 years ago
|
Assignee | ||
Updated•5 years ago
|
Assignee | ||
Comment 3•5 years ago
|
||
We're planning to enable this tomorrow for a trial run to get a sense of:
A) Is everything working as it should (since this is hard to test on try).
B) How much of an impact does this have on sheriffing (and what we need to do to fix it).
We'll run the experiment until Thursday July 30th or it's obvious that it makes sheriffing too difficult (which may only take an hour or two until it gets to that point). If it turns out that sheriffs have no complaints and everything goes smoothly, there's a small chance that we'll leave it enabled past the end date.
Comment 5•5 years ago
|
||
Backed out changeset 9be5f086895c (bug 1643689) for busting gecko decision task and causig bug 1655807
Backout link: https://hg.mozilla.org/integration/autoland/rev/153accc0eb12651fa1b2d19ec1dc89c6cc6477d3
Failure log: https://treeherder.mozilla.org/logviewer.html#?job_id=311287166&repo=autoland
...
[task 2020-07-28T16:24:16.329Z] Generating tasks for release-update-verify-next firefox-next-win32
[task 2020-07-28T16:24:16.329Z] Generated 0 tasks for kind release-update-verify-next
[task 2020-07-28T16:24:16.369Z] Generating full task graph
[task 2020-07-28T16:24:16.448Z] Full task graph contains 24419 tasks and 105201 dependencies
[task 2020-07-28T16:24:21.768Z] PERFHERDER_DATA: {"suites": [{"lowerIsBetter": true, "subtests": [], "shouldAlert": false, "value": 20.07702398099991, "name": "bugbug_push_schedules_time"}, {"lowerIsBetter": true, "subtests": [], "shouldAlert": false, "value": 2, "name": "bugbug_push_schedules_retries"}], "framework": {"name": "build_metrics"}}
[task 2020-07-28T16:24:21.768Z] Traceback (most recent call last):
[task 2020-07-28T16:24:21.768Z] File "/builds/worker/checkouts/gecko/taskcluster/mach_commands.py", line 205, in taskgraph_decision
[task 2020-07-28T16:24:21.768Z] return taskgraph.decision.taskgraph_decision(options)
[task 2020-07-28T16:24:21.768Z] File "/builds/worker/checkouts/gecko/taskcluster/taskgraph/decision.py", line 251, in taskgraph_decision
[task 2020-07-28T16:24:21.768Z] full_task_json = tgg.full_task_graph.to_json()
[task 2020-07-28T16:24:21.768Z] File "/builds/worker/checkouts/gecko/taskcluster/taskgraph/generator.py", line 163, in full_task_graph
[task 2020-07-28T16:24:21.768Z] return self._run_until('full_task_graph')
[task 2020-07-28T16:24:21.768Z] File "/builds/worker/checkouts/gecko/taskcluster/taskgraph/generator.py", line 374, in _run_until
[task 2020-07-28T16:24:21.768Z] k, v = next(self._run)
[task 2020-07-28T16:24:21.768Z] File "/builds/worker/checkouts/gecko/taskcluster/taskgraph/generator.py", line 304, in _run
[task 2020-07-28T16:24:21.768Z] yield verifications('full_task_graph', full_task_graph, graph_config, parameters)
[task 2020-07-28T16:24:21.768Z] File "/builds/worker/checkouts/gecko/taskcluster/taskgraph/util/verify.py", line 58, in __call__
[task 2020-07-28T16:24:21.768Z] parameters=parameters,
[task 2020-07-28T16:24:21.768Z] File "/builds/worker/checkouts/gecko/taskcluster/taskgraph/util/verify.py", line 364, in verify_test_packaging
[task 2020-07-28T16:24:21.768Z] raise Exception("\n".join(exceptions))
[task 2020-07-28T16:24:21.768Z] Exception: Build job build-linux64-tsan/opt has no tests, but specifies MOZ_AUTOMATION_PACKAGE_TESTS=1 in the environment. Unset MOZ_AUTOMATION_PACKAGE_TESTS in the task definition to fix.
[taskcluster 2020-07-28 16:24:23.303Z] === Task Finished ===
[taskcluster 2020-07-28 16:24:44.491Z] Unsuccessful task run with exit code: 1 completed in 198.821 seconds
Assignee | ||
Comment 6•5 years ago
|
||
We decided to backout the trial. The issue happened because the algorithm decided no tests needed to run against that build and it tripped this check here:
https://searchfox.org/mozilla-central/rev/d9f92154813fbd4a528453c33886dc3a74f27abb/taskcluster/taskgraph/util/verify.py#358
I think we may need to disable this check if manifest-scheduling mode is enabled.
![]() |
||
Comment 8•5 years ago
|
||
bugherder |
Comment 9•5 years ago
|
||
disable 1st round of manifest scheduling
Comment 10•5 years ago
|
||
![]() |
||
Comment 11•5 years ago
|
||
bugherder |
Assignee | ||
Comment 12•5 years ago
|
||
The dict needs to be passed to the last two substrategies, not just the last
one.
Assignee | ||
Comment 13•5 years ago
|
||
Sets autoland to use the 'bugbug' test manifest loader. This is being enabled
as part of a temporary trial to see the impact it has on sheriffing.
Depends on D90159
Comment 14•5 years ago
|
||
Comment 15•5 years ago
|
||
bugherder |
Comment 16•5 years ago
|
||
Comment 17•5 years ago
|
||
Backed out changeset 0b196026ed59 (Bug 1643689) for causing issues with manifest scheduling.
Here we can see failed backfill tasks:
https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=315918092&repo=autoland&lineNumber=50
Also failed "dt" jobs:
https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=315917273&repo=autoland&lineNumber=2041
Comment 18•5 years ago
|
||
The backout also seems to have fixed these a11y failures: https://treeherder.mozilla.org/#/jobs?repo=autoland&group_state=expanded&searchStr=linux%2C18.04%2Cx64%2Cdebug%2Cmochitests%2Cwithout%2Ce10s%2Ctest-linux1804-64%2Fdebug-mochitest-a11y-1proc%2Ca11y&fromchange=75f7048e9d0af7f1ef5ff34fc6a807ecfb0fbd86&test_paths=accessible%2Ftests%2Fmochitest%2F&tochange=338bbaf179ae30128893b2acf18ba7e2b417034d&selectedTaskRun=UjVLUNLmRK-RWoRwPAysUQ.0
Assignee | ||
Comment 19•5 years ago
|
||
This was causing |mach try auto| to stop selecting manifests.
Comment 20•5 years ago
|
||
Comment 21•5 years ago
|
||
Comment 22•5 years ago
|
||
bugherder |
Comment 23•5 years ago
|
||
Assignee | ||
Comment 24•5 years ago
|
||
Comment 25•5 years ago
|
||
Comment 26•5 years ago
|
||
bugherder |
Assignee | ||
Comment 27•5 years ago
|
||
Assignee | ||
Comment 28•5 years ago
|
||
Depends on D91587
Assignee | ||
Comment 29•5 years ago
|
||
When enabling manifest scheduling, several interdependencies between tests were
revealed resulting in too many new intermittents. Make sure we disable
manifest-scheduling there for now.
Depends on D91588
Comment 30•5 years ago
|
||
Comment 31•5 years ago
•
|
||
Backed out 3 changesets (bug 1643689) for Gecko Decision Task failure. CLOSED TREE
Log:
https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=317051153&repo=autoland&lineNumber=1801
Push with failures:
https://treeherder.mozilla.org/#/jobs?repo=autoland&group_state=expanded&resultStatus=testfailed%2Cbusted%2Cexception&revision=2912d91dd291de83873211fa4d017d6546551322
Backout:
https://hg.mozilla.org/integration/autoland/rev/0ca25be8f4f8d43e3673dcd545689c3e1663fab0
Assignee | ||
Comment 32•5 years ago
|
||
This is very bizarre, I couldn't reproduce on try and I can't reproduce locally. Even when on the exact same base revision and using parameters.yml from autoland...
I also tried running it with an earlier Python version in case that was the issue, but still no luck.
Assignee | ||
Comment 33•5 years ago
|
||
facepalm
It's because I had already fixed the issue locally, but I guess never ended up submitting the changes to phabricator.
Comment 34•5 years ago
|
||
![]() |
||
Comment 35•5 years ago
|
||
bugherder |
Comment 36•5 years ago
|
||
Comment 37•5 years ago
|
||
bugherder |
Assignee | ||
Comment 39•5 years ago
|
||
I believe we are all done here. Regressions / follow-up work is all tracked in other bugs.
Description
•